Skip to content

feat(hunspell): Add ref_path support for package-based dictionary loa…#20840

Merged
cwperks merged 2 commits intoopensearch-project:mainfrom
shayush622:feat/hunspell-ref-path-core
Mar 18, 2026
Merged

feat(hunspell): Add ref_path support for package-based dictionary loa…#20840
cwperks merged 2 commits intoopensearch-project:mainfrom
shayush622:feat/hunspell-ref-path-core

Conversation

@shayush622
Copy link
Contributor

Description

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter in the hunspell token filter. This enables multi-tenant dictionary isolation where each package can have its own hunspell dictionaries independent of the traditional config/hunspell/ location.

Key changes:

  • ref_path parameter: New optional parameter in the hunspell token filter that specifies a package ID. When used with locale, dictionaries are loaded from config/packages/{ref_path}/hunspell/{locale}/
  • Cache key strategy: Package-based dictionaries use {packageId}:{locale} cache keys to avoid collisions with traditional locale-only keys
  • Cache management: Added invalidateDictionary(), invalidateDictionariesByPackage(), and invalidateAllDictionaries() methods to HunspellService for programmatic cache control
  • Security validation: Path traversal prevention, separator injection protection, and null byte checks in validatePackageIdentifier()
  • Hot-reload support: updateable: true flag enables _reload_search_analyzers API support for package-based dictionaries
  • Backward compatible: Traditional locale-only loading (config/hunspell/{locale}/) continues to work unchanged

Usage example:

{
  "type": "hunspell",
  "ref_path": "pkg-1234",
  "locale": "en_US"
}

This is Part 1 of 2 — a follow-up PR will add REST API endpoints for cache info and invalidation. Split from #20792.

Related Issues

Resolves #20712

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ba3f2c7.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ea0b5f9.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from ea0b5f9 to c77e078 Compare March 11, 2026 15:20
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit c77e078.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

shayush622 added a commit to shayush622/OpenSearch that referenced this pull request Mar 11, 2026
- Add GET /_hunspell/cache endpoint for viewing cached dictionary keys (cluster:monitor/hunspell/cache)
- Add POST /_hunspell/cache/_invalidate endpoint for cache invalidation (cluster:admin/hunspell/cache/invalidate)
- Support invalidation by package_id, locale, cache_key, or invalidate_all
- Add TransportHunspellCacheInfoAction and TransportHunspellCacheInvalidateAction
- Consistent response schema with all fields always present
- Register actions and REST handler in ActionModule
- Add comprehensive unit tests, REST handler tests, and integration test

Depends on opensearch-project#20840

Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for loading Hunspell dictionaries from package-scoped directories via a new ref_path parameter, enabling per-package dictionary isolation and cache invalidation APIs to support hot-reload workflows.

Changes:

  • Add package-based dictionary loading in HunspellService with {packageId}:{locale} cache keys and cache invalidation helpers.
  • Extend HunspellTokenFilterFactory to accept ref_path, validate identifiers, and support updateable: true via AnalysisMode.SEARCH_TIME.
  • Add/extend tests and test resources for package-based Hunspell dictionaries, and note the feature in the changelog.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java Implements package-based dictionary loading + cache key utilities and invalidation methods.
server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java Adds ref_path parameter support, identifier validation, and hot-reload analysis mode handling.
server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java Exposes HunspellService via a now-public getter for downstream wiring.
server/src/main/java/org/opensearch/node/Node.java Wires HunspellService into node bootstrap path (currently introduces a constructor mismatch).
server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java Adds unit tests for package-based loading, caching, and invalidation APIs.
server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java Adds integration-style tests for ref_path behavior, validation, and updateable mode.
server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff Adds package-based hunspell test affix file under config/packages/....
CHANGELOG.md Adds a changelog entry for the new ref_path support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +508 to +517
String prefix = packageId + CACHE_KEY_SEPARATOR; // Match keys like "pkg-1234:en_US"
int sizeBefore = dictionaries.size();
dictionaries.keySet().removeIf(key -> key.startsWith(prefix));
int count = sizeBefore - dictionaries.size();

if (count > 0) {
logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", count, packageId);
} else {
logger.debug("No cached dictionaries found for package: {}", packageId);
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateDictionariesByPackage() computes the invalidated count as sizeBefore - dictionaries.size(), but this can be incorrect in concurrent scenarios (entries can be added/removed while the method runs). Either document that the returned count is approximate (as done in invalidateAllDictionaries()), or compute the count based on the keys actually removed.

Copilot uses AI. Check for mistakes.
@shayush622 shayush622 mentioned this pull request Mar 11, 2026
3 tasks
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 0fca446

@github-actions
Copy link
Contributor

❌ Gradle check result for 0fca446: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from 0fca446 to 1513ae8 Compare March 18, 2026 07:41
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 1513ae8

@github-actions
Copy link
Contributor

Persistent review updated to latest commit fc6800b

@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from fc6800b to 6e7ed83 Compare March 18, 2026 07:55
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 6e7ed83

@github-actions
Copy link
Contributor

✅ Gradle check result for 6e7ed83: SUCCESS

@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from 6e7ed83 to e23316e Compare March 18, 2026 10:12
@github-actions
Copy link
Contributor

Persistent review updated to latest commit e23316e

@github-actions
Copy link
Contributor

❌ Gradle check result for e23316e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from e23316e to 4630c23 Compare March 18, 2026 11:44
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 4630c23

@github-actions
Copy link
Contributor

✅ Gradle check result for 4630c23: SUCCESS

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 6fd0caf

…ding

- Add ref_path parameter to HunspellTokenFilterFactory for package-based dictionaries
- Load from config/analyzers/{packageId}/hunspell/{locale}/
- Node-level cache with {packageId}:{locale} cache keys for multi-tenant isolation
- Refactor loadDictionary to accept baseDir parameter for code reuse
- Add regex allowlist validation for ref_path and locale
- Shared loadDictionaryFromDirectory for .aff/.dic file loading
- Backward compatible: traditional config/hunspell/{locale}

Signed-off-by: shayush622 <ayush5267@gmail.com>
@shayush622 shayush622 force-pushed the feat/hunspell-ref-path-core branch from 6fd0caf to d501490 Compare March 18, 2026 14:23
@github-actions
Copy link
Contributor

Persistent review updated to latest commit d501490

Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit ef1e012

@github-actions
Copy link
Contributor

❕ Gradle check result for ef1e012: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@cwperks cwperks merged commit 9f5d0e1 into opensearch-project:main Mar 18, 2026
35 checks passed
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Mar 20, 2026
…ding (opensearch-project#20840)

- Add ref_path parameter to HunspellTokenFilterFactory for package-based dictionaries
- Load from config/analyzers/{packageId}/hunspell/{locale}/
- Node-level cache with {packageId}:{locale} cache keys for multi-tenant isolation
- Refactor loadDictionary to accept baseDir parameter for code reuse
- Add regex allowlist validation for ref_path and locale
- Shared loadDictionaryFromDirectory for .aff/.dic file loading
- Backward compatible: traditional config/hunspell/{locale}

Signed-off-by: shayush622 <ayush5267@gmail.com>
Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants